

RECEIVED RECEIVED  
OCT 16 1999 OCT 20 1999

TECH CENTER 2500  
TECH CENTER 2700

## RESPONSE TO RESTRICTION REQUIREMENT

Applicant elects the invention of Group I, claims 1-3, 5, 7-17 and 21-25. Applicant submits that in light of the amendment to claim 6 (making it depend from claim 5), claim 6 is also in Group I and should be examined.

### REMARKS

Applicant respectfully requests reconsideration and allowance of claims 1-3, 5-17 and 21-25 which are pending in the above-identified application. Claim 6 has been amended.

A substitute specification is submitted herewith in response to the Examiner's requirement.

Applicant has elected the invention of Group I and submits that claims 1-3, 5-17, and 21-25 read on the elected invention.

In light of the above, Applicant submits that the instant claims are in condition for allowance. Early and favorable action is earnestly solicited.

I hereby certify that this correspondence is being deposited with the United States Postal Service with sufficient postage as First Class Mail in an envelope addressed to: Asst. Commissioner for Patents, Washington, D.C. 20231, on October 8, 1999

Matthew B. Dernier  
\_\_\_\_\_  
Name of applicant, assignee or  
Registered Representative  
*Matthew B.D.*  
\_\_\_\_\_  
Signature  
\_\_\_\_\_  
October 8, 1999  
Date of Signature  
\_\_\_\_\_  
MBD:dmt

Respectfully submitted,

*Matthew B.D.*  
\_\_\_\_\_  
Matthew B. Dernier  
Registration No.: 40,989  
OSTROLENK, FABER, GERB & SOFFEN, LLP  
1180 Avenue of the Americas  
New York, New York 10036-8403  
Telephone: (212) 382-0700



**-SUBSTITUTE SPECIFICATION-***Please enter*  
2/20/99  
  
- 1 -

**DATA PROCESSING CIRCUIT, MULTIPLIER UNIT WITH  
PIPELINE, ALU AND SHIFT REGISTER UNIT FOR USE  
IN A DATA PROCESSING CIRCUIT**

**BACKGROUND OF THE INVENTION:**

Nowadays a large number of personal computers make use of processors which have a complex instruction set (CISC). Such processors are provided with a central processing unit, the function of which is adjusted at each clock pulse to perform the desired operation on two operand words. These processors are currently commercially available under Intel code numbers beginning with 80.

Although the clock speed of the function adjustable processors has been increased considerably, the organizational structure of such a processor forms a great obstacle to further increase the processing speed. For instance, during multiplying and dividing of two operand words, frequent use must be made of registers internally present in the computer.

So-called work stations often use a pipeline structure with a reduced instruction set (the so-called RISC, Reduced Instruction Set Computer) in order to increase the speed of the work station. This structure provides an increase in speed of so-called vector operations, wherein a large number of data words have to be subjected to the same arithmetic operation. Since a limited instruction set can be implemented efficiently, the execution of a large number of instructions requires only a single clock pulse.

Although the RISC structure achieves an increase in speed for frequently occurring operations (such as multiplications) more complex instructions for particular operations are omitted from the instruction set and, therefore, the speed of executing such operations is not increased. In addition, the processing unit

is often designed for data words with a fixed word length, for example 32 or 64 bits.

In EP-A-0173383, a processor for floating point operations is disclosed. Such floating point operations are not useful for image or graphical processing applications, where operations have to be performed on integer data words of 8, 16 or 32 bits.

5 In the article "The 1860TH 64-bit supercomputing microprocessor" by L. Kohn et al, published in the proceedings of supercomputing, 13-17 November 1989, Reno, Nevada, VS, 1989, IEEE Computer Society Press, Washington D.C., a RISC based micro-processor for executing multiplications for either 64 bit or 32 bit 10 words is described. As described above, such RISC concept does not provide for increased speed when integer data words of 8 bits or multiples thereof have to be processed.

15 Also, in EP-A-0380100, a multiplier is disclosed for processing 32 bit operands to provide two 16 bit by 16 bit fixed point products for one 32 bit floating point product during each clock cycle.

For image and/or graphics processing applications however, operations have to be performed on data words of 8 or 16 bits or a number of mutually associated bytes before even a limited speed increase is achieved in the RISC concept.

20 The present invention provides a data processing circuit comprising:

- a multiplier unit for multiplying integer data words of 8 bits or multiples thereof having a pipeline and in which the word length is adjustable for multiplying the integer data words;

- an arithmetic logic unit (ALU) having an adjustable word length for performing arithmetic operations on integer data words of 8 bits or multiples thereof;

- a register unit provided with at least two registers for storing the integer data words of 8 bits or multiples of 8 bits on which the operation and/or pipeline multiplication has to be performed; and

- a bus structure which comprises a number of separate buses and which effects the transport of integer data words from and to the multiplier unit, the arithmetic logic unit and the register unit.

The data processing unit according to the present invention achieves a speed, for graphic applications, that is more than twice as great as in existing systems. In contrast to RISC and CISC, the data flow between the above specified circuits (multiplier, ALU etc.) is not fixed. Rather, the programmer is free to program the sequence of the data flow through the different units (free pipeline).

The present invention further provides a multiplier unit with a pipeline for use in a data processing circuit.

The present invention also comprises an arithmetic logic unit for use in a data processing circuit.

Finally, the present invention provides a shift register unit for use in a data processing circuit.

15

#### **BRIEF DESCRIPTION OF THE DRAWINGS**

*Seb H2*

Further advantages, features and details of the present invention will be elucidated on the basis of the following description of a preferred embodiment thereof with reference to the annexed drawing, in which: fig. 1 shows a functional diagram of a graphic application of a data processing circuit according to the present invention,

Fig. 2 shows an outline diagram of the data processing circuit of fig. 1;

Fig. 3 shows a functional diagram of the internal structure of the data processing circuit of Fig. 1;

Fig. 4 shows a first functional diagram of the arithmetic logic unit of the diagram of Fig. 3;

Fig. 5 shows a second functional diagram of the arithmetic logic unit

of the diagram of Fig. 3;

Fig. 6 shows a functional diagram of the multiplier unit with pipeline of the diagram of Fig. 3;

Fig. 7 shows a functional diagram of a Wallace tree in the diagram of  
5 Fig. 6; and

Fig. 8 shows a functional diagram of the shift register unit from the functional diagram of Fig. 3.

**DETAILED DESCRIPTION OF THE INVENTION:**

A data processing circuit 1 (fig. 1) according to the present  
10 invention, also named DISC or IMAGINE, is coupled via a bus 2 to a data memory 3, for instance SRAM (Static Random Access Memory). The data processing circuit 1 is further connected via a bus 4 to a main or video memory 5 for storage of image data, which is constructed from DRAM (Dynamic Random Access Memory) cells or is a (more expensive) VRAM. This main memory 5 drives a RAMDAC (Random  
15 Access Memory for a Digital Analog Converter) 7 via bus 6, which in turn provides a monitor (not shown) with the color signals R (red), G (green) and B Blue).

In practical applications the data processing circuit 1 will be coupled via a buffer 9 and access logic 10 to a host processor (not shown). The configuration of Fig. 1 is preferably further provided with an instruction RAM 11 which is coupled via a bus 12 to the data processing circuit 1 as well as via a buffer 112 in which registers and drive means are incorporated. A clock means 13 provides the diverse components of the configuration with clock signals while a circuit 14 is included in the con-  
20 figuration for the video timing. A video input circuit 15 is preferably connected to the bus 6 for feeding video signals to the image memory 5.

25 The structure of the data processing circuit is shown schematically in Fig. 2 and comprises a parallel multiplier 20 which comprises a RAM 21, an accumulator 22 and a Wallace tree 23. The data processing circuit also comprises a

data input and output circuit 24, a parallel shift register 25, a bus structure 26, a circuit 28 for unary operations, a circuit 29 for driving the image memory, a circuit 30 for image input and output, an arithmetic logic unit 31, a circuit 32 for driving the register bank and a vector index generator, a register bank 33, a mask generator 34 which comprises a transparent mask 35, an opaque mask 36, a window mask 37, a line mask 38, a polygon mask 39, a mask assembly means 40 and a range check 41, a circuit with phase-locked loop 42 and a circuit 43 for instruction processing which comprises a program control 44, start-up ROM 45 and an interrupt processing means 46.

10 *Sub H<sub>3</sub>* The bus structure 26 (Fig. 3) comprises a control SC-bus 51, an A-bus  
S2, a B-bus 53, a Q-bus 54, an F-bus 55, an M-bus 56, a U-bus 57, a D-bus 58 and a  
V-bus 59, each of which are, for instance, 32 bits wide.

15 *Sub H<sub>4</sub>* The register bank 33 is connected via output registers 60 and 61 to the  
A and B bus respectively. Register bank 33 contains ninety-six inputs which are single  
32 bit, double 16 bit or quadruple 8 bit words. Three ports enable simultaneous  
performance of two read actions and a write action. Sixty-two of the nenty-six  
registers are directly accessible. The remaining thirty-two inputs are addressed via  
the vector index generator 32 which can generate a maximum of 12 locations per  
cycle (i.e., four byte sections for each of the three ports, since each word segment can  
be selected separately within the registers).

20 *Sub H<sub>5</sub>* The parallel shift register 25 is designed such that it can shift 32 bits  
of data anywhere from 1 to 32 positions to the left or right in one clock cycle based  
on the information received via the A-bus 52. The information can be grouped into  
one, two or four sections of 32, 16 and 8 bits respectively. The shift can take place  
logically (unsigned), numerically (signed) and rotatingly. The operands are received  
from the B-bus 54 or the F-bus 55. The parallel shift register 25 is connected via a  
register 62 to the Q-bus 54. Fig. 8 schematically shows an example of a two step  
rotation of a 32 bit word (consisting of two 16-bit bytes) through 11 bits in a positive

~~direction by way of four 8 bit rotations and eight 4 bit crossings.~~

*Subj to* With reference to Fig. 3, the arithmetic logic unit 31 (ALU) is connected to the A-bus S2, the Q-bus 54, the M-bus 56, the D-bus 58, the U-bus 57, the B-bus 53, the F-bus 55, again to the U-bus 57 and the V-bus 59. All the usual logic operations of a conventional ALU can be performed by the ALU of the present invention in addition to numerical functions such as addition, subtraction, increment and decrement. The ALU 31 is further provided with a so-called parametric logic function. On the basis of the content of an 8 bit register, the ALU 31 can perform a random combination of 256 possible logic operations on 3 operands. The standards for X-window and MS-windows specify that logic and graphic operations must be possible in any combination. The parametric function can also be used to realize shifting, masking, combining or comparing operations in a single clock cycle.

The ALU 31 can be adjusted as a single, double or quadruple parallel unit for 32, 16 and 8 bit operands respectively. The data coming from the A-, Q-, — or D-buses determines the selection of the size of the operands to be processed. A mode selector 63 is connected to the ALU 31 and generates a status signal on output 64. The ALU 31 is further connected to the F-bus 55 via an output register 64. Fig. 4 shows a functional diagram of the ALU for a parallel quadruple operation on operands of 24 bits, while Fig. 5 shows a functional diagram of a double operation with 48 bit operands. In Fig. 5, two selectors and two accumulators, each of 8 bits, are combined.

The multiplier 23 is embodied as pipeline with five clock cycles. The multiplier is capable of performing pipeline operations on 32 bit, 16 bit and 8 bit words. All possible multiplication operations with numbers, signed and unsigned, or a combination thereof, in addition to execution of the multiplication of 16 bit complex numbers and 8 bit matrices with vectors is possible due, inter alia, to the presence of a Wallace tree (Fig. 7). The multiplier operates internally with 48 bit results or double 24 bit or quadruple 12 bit values, two of which are transported

simultaneously via 96 bit data channels. Fig. 6 shows a functional diagram of the multiplier with five clock levels. The multiplier is connected to the M-bus 56 via an output register 66.

The circuit for unary operations 28 converts data, for instance, binary  
5 to unary (linear), indicates the position of the most significant bit, determines the absolute value of a sign and reverse the bit sequence of a word. Circuit 28 can operate on a word of 32, 16 or 8 bits.

The mask generator 24 has a number of independent sub-units. The window mask 37 determines which regions the other operations must fall. The circuit  
10 41 for range checking operates on the basis of pre-defined patterns and, therefore one of its most important applications is generating letter characters. The circuit 41 also serves to check three-dimensional pixel data, such as depth and color.

The line mask 38 generates a horizontally defined pattern between a predetermined beginning and end. The line mask 38 can generate up to four lines  
15 simultaneously and supports, for instance, the creation of polygons. A shape along a horizontal line of the image can be produced using the line mask 38, when no interruptions occur along the line.

The polygon mask 39 serves to generate elements for which the line generator is not suitable, for instance, Chinese characters. The polygon mask 39  
20 defines the number of contour transitions on the horizontal lines passing through a relevant pixel.

The mask assembly 40 performing the function of overlaying diverse masks. The results from the mask assembly 40 is transmitted to the respective transparent and/or opaque masks 35, 36 where the actual image for display is created.  
25 The transparent and opaque masks 35, 36 can both contain a maximum of 128 pixels in a matrix of 4 x 32.

The circuit for data input and output 24 is connected to a 32 bit data channel and a 32 bit address bus. The range for addressing comprises 32 Mbyte.

The entry of instructions takes place under the control of the program control unit 44. With a 22 bit address, a following instruction word is continuously assigned which is subsequently entered via a separate 64 bit bus. The program memory can have a size of 4M x 64 bits.

5           The drive of the image memory 29 is adapted to generate an address on the basis of an X/Y position so that any random image segment can be addressed on the basis of its location in the image and in the image memory. The image memory is also suitable for storing other data banks such as lists and data banks with graphic elements.

10           When a clock frequency of 66 MHZ is used for a data processing circuit according to the present invention, it is possible to operate system such that the access time for the memory is 70 ns.

15           The data processing circuit can be programmed in a higher programming language, such as C, so that it is easily programmed, as in RISC and CISC processing units. The data processing circuit 1 can be programmed with instructions according to the RISC concept as well as with the CISC instructions of a personal computer. In order to achieve a large increase in speed for graphic applications, the programmer can program all functions of the data processing circuit 1 at a lower level via an instruction field of 64-bits. The ALU 31 and the multiplier unit can be set to parallel operations, whereby the speed for graphic applications can be increased by a factor of 4-20 as compared to existing RISC processors. For a particular application, a programmer will set a "once-only" series of instructions and control registers. Subsequently, the programmer will start the processor with one command, "hereafter the processor independently processes the pixel flows."

20           As example of the speed increase which can be gained by way of the present invention, algorithm consisting of five instructions for rotating and interpolating a color image is presented which can accommodate a total of 38 instructions, that is:

read 2 x 16 bit register;  
increment 2 x 16 bit register address;  
read 1 x 10 bit constant;  
shift 2 x 16 bit word;  
5 read 2 x 16 bit constant;  
add 2 x 16 bit value;  
read out 4 x 8 bit 2D memory data;  
read out 4 x 8 bit image memory data;  
increment 1 x 32 bit image memory address;  
10 multiply 4 x 8 bit value;  
read 4 x 12 bit accumulator register;  
accumulate 4 x 12 bit value write 4 x 12 bit accumulator register; and  
increment 2 x 5 bit register address accumulator.

The data processing circuit according to the present invention can be  
15 built into specific equipment but can also be embodied as an extension card for a personal computer. Owing to the flexible utilization of the hardware, even at lower clock speeds than, for instance, 200 MHZ, which is currently among the highest, from 5 to 20 times improvement in image processing speed can be obtained. This makes the data processing circuit according to the present invention suitable for  
20 real-time video operations and so-called virtual reality.

*Sub H* ~~Since it is practically impossible to describe all the possibilities of~~  
the present invention on account of its complexity, a product specification incorporated by reference, insofar as this is completed, is appended as an annex. As is usual in this technical field, this specification is written in the English language. After completion it will become part of the public domain, probably within a year.

**WHAT IS CLAIMED IS:**

1. A circuit for processing integer data for graphic image processing applications, comprising:

5            a multiplier unit having a pipeline for multiplying integer data words of 8 bits or multiples thereof, the pipeline being adjustable to the length of the integer data words to be multiplied;

an arithmetic logic unit (ALU) for performing arithmetic operations on integer data words of 8 bits or multiples thereof, the word length of the ALU being adjustable in accordance with the multiple of 8 bits constituting the integer and data words;

10            a register unit provided with at least two registers for storage of the integer data words on which one of the operation and pipeline multiplication has to be performed; and

15            a bus structure for effecting the transport of integer data words from and to the multiplier unit, the arithmetic logic unit and the register unit, the bus structure having a plurality of separate buses each having a separate register connected thereto for transmitting and receiving the integer data words.

2. The circuit according to claim 1, wherein the pipeline is a five-step pipeline.

3. The circuit according to claim 1, wherein the integer data comprises one of 32 bit words, 16 bit words, and 8 bit words.

4. A multiplier unit having a pipeline and a variable length accumulator, the multiplier having a word length which is adjustable and the multiplication is performed in accordance with the length of integer data words being multiplied, the length of the integer data words being 8 bits or a multiple thereof.

5. An arithmetic logic unit comprising a plurality of partitioned arithmetic logic units therein, the word length of the arithmetic logic unit being adjustable in accordance with the length of the integer data words being processed, the length of the integer data words being 8 bits or multiples thereof.
6. A shift register unit having control logic capable of receiving an integer data word having a length which is variable in increments of 8 bits, the shift register unit for shifting a 32 bit integer data word through a distance of 1 to 32 bits, in one of a left and a right direction and in one of a rotating and a non-rotating manner.
7. The circuit according to claim 1, in integrated form.
8. The circuit as claimed in claim 1, further comprising an instruction register, wherein the bus structure is provided with a plurality of registers and wherein the transport of the integer data words from and to the multiplier unit, the arithmetic logic unit and the register unit is programmable from the instruction register.
9. The circuit according to claim 2, wherein the integer data comprises 32 bit or 16 bit words.
10. The circuit according to claim 2, wherein the integer data comprises 32 bit or 16 bit words.
11. The circuit according to claim 3, wherein the integer data comprises 32 bit or 16 bit words.

12. The circuit as claimed in claim 2, further comprising an instruction register, wherein the bus structure is provided with a plurality of registers and wherein the transport of the integer data words from and to the multiplier unit, the arithmetic logic unit and the register unit is programmable from the instruction register.

13. The circuit as claimed in claim 3, further comprising an instruction register, wherein the bus structure is provided with a plurality of registers and wherein the transport of the integer data words from and to the multiplier unit, the arithmetic logic unit and the register unit is programmable from the instruction register.

14. The circuit as claimed in claim 7, further comprising an instruction register, wherein the bus structure is provided with a plurality of registers and wherein the transport of the integer data words from and to the multiplier unit, the arithmetic logic unit and the register unit is programmable from the instruction register.

15. A circuit for processing digital data words, comprising:  
5        a multiplier unit for multiplying the digital data words, the multiplier unit having a pipeline in which the word length is adjustable to match the length of the digital data words, the digital data having a length which varies incrementally;  
an arithmetic logic unit (ALU) capable of performing arithmetic operations on the digital data words, the ALU being adjustable to match the length of the digital data words;  
10        a register unit having at least two registers for storage of the digital data words; and  
a bus structure for transporting the digital data words from and to the

multiplier unit, the arithmetic logic unit and the register unit, the bus structure having a plurality of separate buses each having a register connected thereto for transmitting and receiving the digital data words.

16. The circuit according to claim 15, wherein the pipeline is a five-step pipeline.

17. The circuit according to claim 15, wherein the multiplier unit further comprises a Wallace tree.

18. The circuit according to claim 15, wherein the digital data words have a length which varies in increments of 8 bits.

19. A multiplier unit for multiplying digital data words having a length which varies incrementally, the multiplier unit having a pipeline, the pipeline having a word length which is adjustable to match the length of the digital data words.

20. An arithmetic logic unit capable of performing arithmetic operations on digital data words having a length which varies incrementally, the arithmetic logic unit being adjustable to match the length of the digital data words.

**DATA PROCESSING CIRCUIT, MULTIPLIER UNIT WITH  
PIPELINE, ALU AND SHIFT REGISTER UNIT FOR USE  
IN A DATA PROCESSING CIRCUIT**



**ABSTRACT OF THE DISCLOSURE**

The present invention provides a circuit for processing integer data, especially for graphic applications having a multiplier unit which includes a pipeline in which the word length is adjustable for multiplying integer data words of 8 bits or multiples thereof; an arithmetic logic unit (ALU) for performing arithmetic operations on integer data words, the word length of which is adjustable in 8 bits or multiples thereof; a register unit provided with at least two registers for storage of integer data words having multiples of 8 bits on which the operation and/or pipeline multiplication has to be performed; and a bus structure having a number of separate buses which effects the transport of integer data words from and to the multiplier unit, the arithmetic logic unit and the register unit.